DIADEM is R package for differential analysis of Hi-C data. It takes advantage of significant linear correlations of main diagonals between different Hi-C data sets (cell lines, experiments, etc.) - usually the first 100 diagonals from the main diagonal are considered. DIADEM uses GLM to model these dependancies and then quantifies deviations from the model in a probabilistic way.
This vignette contains the description of our method:
If you want to spare yourself extra reading and jump straight to analysis part, please see the quick start chapter. It contains a step-by-step introduction to Hi-C differential analysis with our package using an example Hi-C data set.
Load the package:
library("DIADEM")
To compare Hi-C experiments, you will need 2 files with Hi-C matrices. In this tutorial, we will use Hi-C data sets of human IMR90 and human MSC cells lines from GSE63525 and GSE52457 studies respectively available as sample data in DIADEM package. For simplicity we will only use chromosome 18. First, we read the data:
mtx.fname.imr90 <- system.file("extdata", "IMR90-MboI-1_40kb-raw.npz",
package = "DIADEM", mustWork = TRUE)
mtx.fname.msc <- system.file("extdata", "MSC-HindIII-1_40kb-raw.npz",
package = "DIADEM", mustWork = TRUE)
hic.comparator <- HiCcomparator(mtx.fname.imr90, mtx.fname.msc, mtx.names = c("18"), do.pca = TRUE)
Illustrate Hi-C contact maps with A/B compartments (feel free to adjust margins as you want):
dense.imr90 <- sparse2dense(hic.comparator$maps1[["18"]][c("i","j","val")],
N = hic.comparator$maps.dims[["18"]][1,"n.rows"])
dense.msc <- sparse2dense(hic.comparator$maps2[["18"]][c("i","j","val")],
N = hic.comparator$maps.dims[["18"]][2,"n.rows"])
plot.margins <- 0.1
n.plots <- 4
par(mfrow = c(2,2), mai = rep(plot.margins, n.plots))
plot_contact_map(dense.imr90)
plot_contact_map(dense.msc)
plot_pc_vector(hic.comparator$pc1.maps1[["18"]])
plot_pc_vector(hic.comparator$pc1.maps2[["18"]])
Figure 1.1: IMR90 and MSC contact maps with A/B compartments of human chromosome 18.
Determine TADs for both Hi-C maps and illustrate them:
tads.imr90 <- map2tads(dense.imr90, resolution = 40*1000, window.bp = 400*1000, delta.bp = 80*1000)
tads.msc <- map2tads(dense.msc, resolution = 40*1000, window.bp = 400*1000, delta.bp = 80*1000)
plot_with_inset(list(dense.imr90), c(600,900), c(600,900), args.regions = list(tads.imr90))
plot_with_inset(list(dense.msc), c(600,900), c(600,900), args.regions = list(tads.msc))
Figure 1.2: IMR90 contact map of human chromosome 18 with TADs.
Figure 1.3: MSC contact map of human chromosome 18 with TADs.
Compare coverage and decays for IMR90 and MSC, for decays use log10 scales for both X and Y axis:
library("ggplot2")
coverage <- dominating_signal(hic.comparator, which.signal = "coverage")
ggplot(coverage, aes(x = i, y = sum.contacts, color = dataset)) +
geom_point(size = 0.5) +
geom_smooth(alpha = 0.5) +
facet_wrap(~ name, ncol = 1, scales = "free") +
theme(legend.position = "bottom")
decay <- dominating_signal(hic.comparator, which.signal = "decay")
ggplot(decay[decay$diagonal != 0,], aes(x = diagonal, y = mean.contacts, color = dataset)) +
geom_point(size = 0.5) +
scale_x_log10() +
scale_y_log10() +
facet_wrap(~ name, ncol = 1, scales = "free") +
theme(legend.position = "bottom")
Figure 1.4: Coverage comparison between IMR90 and MSC Hi-C data of human chromosome 18.
Figure 1.5: Decay comparison between IMR90 and MSC Hi-C data of human chromosome 18.
First, calculate the fold change and differential maps IMR90 / MSC and IMR90 - MSC respectively. Then, illustrate results with zoomin of 600 - 900 bins region. By default, when plotting fold change maps log10 scale is used. When plotting the differential map, one should indicate square root transformation of data for better visibility (see examples below):
merged <- merge(hic.comparator)
merged.18 <- merged[["18"]]
merged.18$fc <- merged.18$val.x / merged.18$val.y
merged.18$difference <- merged.18$val.x - merged.18$val.y
dense.fc <- sparse2dense(merged.18[c("i","j","fc")], N = hic.comparator$maps.dims[["18"]][1,"n.rows"])
dense.diff <- sparse2dense(merged.18[c("i","j","difference")],
N = hic.comparator$maps.dims[["18"]][1,"n.rows"])
plot_with_inset(list(dense.fc, fc = TRUE, colors.pal = c("blue","white","red")), c(600,900), c(600,900), which.map = "contact.map")
plot_with_inset(list(dense.diff, breaks = 100, sqrt.transform = TRUE), c(600,900), c(600,900), which.map = "diff.map")
Figure 1.6: Fold Change map of IMR90 / MSC of human chromosome 18.
Figure 1.7: Differential map of IMR90 - MSC of human chromosome 18.
Check the correlations between corresponding diagonals of IMR90 and MSC:
library("reshape2")
decay.cors <- decay_correlation(hic.comparator)
# wide to long
decay.cors.long <- reshape2::melt(decay.cors[c("name","diagonal","pcc","rho","tau")],
id.vars = c("name","diagonal"), variable.name = "correlation",
value.name = "coefficient")
# remove 0 diagonal (as it is non informative anyways) and illustrate results
ggplot(decay.cors.long[decay.cors.long$diagonal != 0,],
aes(x = diagonal, y = coefficient, color = correlation)) +
geom_point(size = 0.7) +
scale_x_continuous(breaks = c(1,seq(0, max(decay.cors.long$diagonal), by = 250)[-1])) +
facet_wrap(~ name, ncol = 1, scales = "free") +
theme(legend.position = "bottom")
dc <- decay.cors[c("name","diagonal","pearson.pval","spearman.pval","kendall.pval")]
colnames(dc) <- c("name","diagonal","pearson","spearman","kendall")
decay.sig.long <- reshape2::melt(dc, id.vars = c("name","diagonal"),
variable.name = "correlation", value.name = "pval")
decay.sig.long$neg.log.pval <- -log10(decay.sig.long$pval)
ggplot(decay.sig.long[decay.sig.long$diagonal != 0,],
aes(x = diagonal, y = neg.log.pval, color = correlation)) +
geom_point(size = 0.7) +
geom_hline(yintercept = -log10(0.01)) +
annotate("text", 1800, -log10(0.01), vjust = -1, label = "pval = 0.01") +
scale_x_continuous(breaks = c(1,seq(0, max(decay.sig.long$diagonal), by = 250)[-1])) +
ylab("-log10(pval)") +
facet_wrap(~ name, ncol = 1, scales = "free") +
theme(legend.position = "bottom")
Figure 1.8: Correlations between diagonals of IMR90 and MSC Hi-C data of human chromosome 18.
Figure 1.9: Significances of correlations between diagonals of IMR90 and MSC Hi-C data of human chromosome 18.
Construct Hi-C glm models:
hic.glm <- HiCglm(hic.comparator, diagonals = 1:150)
Plot expected vs observed plot of E[Y = MSC | X = IMR90]:
plots.eo <- plotHiCglm(hic.glm, ncol = 10, diagonals = as.character(1:100))
plots.eo[["18"]][["Y | X"]]